skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Yang, Haoran"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available August 13, 2026
  2. Increasing studies have shown bugs in multi-language software as a critical loophole in modern software quality assurance, especially those induced by language interactions (i.e., multilingual bugs). Yet existing tool support for bug detection/localization remains largely limited to single-language software, despite the long-standing prevalence of multi-language systems in various real-world software domains. Extant static/dynamic analysis and deep learning (DL) based approaches all face major challenges in addressing multilingual bugs. In this paper, we present xLoc, a DL-based technique/tool for detecting and localizing multilingual bugs. Motivated by results of our bug-characteristics study on top locations of multilingual bugs, xLoc first learns the general knowledge relevant to differentiating various multilingual control-flow structures. This is achieved by pre-training a Transformer model with customized position encoding against novel objectives. Then, xLoc learns task-specific knowledge for the task of multilingual bug detection/localization, through another new position encoding scheme (based on cross-language API vicinity) that allows for the model to attend particularly to control-flow constructs that bear most multilingual bugs during fine-tuning. We have implemented xLoc for Python-C software and curated a dataset of 3,770 buggy and 15,884 non-buggy Python-C samples, which enabled our extensive evaluation of xLoc against two state-of-the-art baselines: fine-tuned CodeT5 and zero-shot ChatGPT. Our results show that xLoc achieved 94.98% F1 and 87.24%@Top-1 accuracy, which are significantly (up to 162.88% and 511.75%) higher than the baselines. Ablation studies further confirmed significant contributions of each of the novel design elements in xLoc. With respective bug-location characteristics and labeled bug datasets for fine-tuning, our design may be applied to other language combinations beyond Python-C. 
    more » « less
  3. Developing software projects that incorporate multiple languages has been a prevalent practice for many years. However, the issues encountered by developers during the development process, the underlying challenges causing these issues, and the solutions provided to developers remain unknown. In this paper, our objective is to provide answers to these questions by conducting a study on developer discussions on Stack Overflow (SO). Through a manual analysis of 586 highly relevant posts spanning 14 years, we revealed that multilingual development is a highly and sustainably active topic on SO, with older questions becoming inactive and newer ones getting first asked (and then mostly remaining active for more than one year). From these posts, we observed a diverse array of issues (11 categories), primarily centered around interfacing and data handling across different languages. Our analysis suggests that error/exception handling issues were the most difficult to resolve among those issue categories, while security related issues were most likely to receive an accepted answer. The primary challenge faced by developers was the complexity and diversity inherent in building multilingual code and ensuring interoperability. Additionally, developers often struggled due to a lack of technical expertise on the varied features of different programming languages (e.g., threading and memory management mechanisms). In addition, properly handling message passing across languages constituted a key challenge with using implicit language interfacing. Notably, Stack Overflow emerged as a crucial source of solutions to these challenges, with the majority (73%) of the posts receiving accepted answers, most within a week (36.5% within 24 hours and 25% in the following six days). Based on our analysis results, we have formulated actionable insights and recommendations that can be utilized by researchers and developers in this field. 
    more » « less
  4. For many years now, modern software is known to be developed in multiple languages (hence termed asmultilingualormulti-languagesoftware). Yet, to date, we still only have very limited knowledge about how multilingual software systems are constructed. For instance, it is not yet really clear how different languages are used, selected together, and why they have been so in multilingual software development. Given the fact that using multiple languages in a single software project has become a norm, understanding language use and selection (i.e.,language profile) as a basic element of themultilingual constructionin contemporary software engineering is an essential first step. In this article, we set out to fill this gap with a large-scale characterization study on language use and selection in open-source multilingual software. We start with presentingan updated overviewof language use in 7,113 GitHub projects spanning the 5 past years by characterizing overall statistics of language profiles, followed bya deeper lookinto the functionality relevance/justification of language selection in these projects through association rule mining. We proceed with an evolutionary characterization of 1,000 GitHub projects for each of the 10 past years to providea longitudinal viewof how language use and selection have changed over the years, as well as how the association between functionality and language selection has been evolving. Among many other findings, our study revealed a growing trend of using three to five languages in one multilingual software project and the noticeable stableness of top language selections. We found a non-trivial association between language selection and certain functionality domains, which was less stable than that with individual languages over time. In a historical context, we also have observed major shifts in these characteristics of multilingual systems both in contrast to earlier peer studies and along the evolutionary timeline. Our findings offer essential knowledge on the multilingual construction in modern software development. Based on our results, we also provide insights and actionable suggestions for both researchers and developers of multilingual systems. 
    more » « less